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• Premise of the study: We report 1 1 primer sets for nine single-copy nuclear genes in Streptanthus and other Thelypodieae 
(Brassicaceae) and their utility at tribal-level and species-level phylogenetics in this poorly resolved group. 

• Methods and Results: We selected regions based on a cross-referenced matrix of previous studies and public Brassica ex- 
pressed sequence tags. To design primers, we used alignments of low-depth-coverage Illumina sequencing of genomic DNA 
for two species of Brassica mapped onto Arabidopsis thaliana. We report several primer combinations for five regions that 
consistently amplified a single band and yielded high-quality sequences for at least 70% of the species assayed, and for four 
additional regions whose utility might be clade specific. 

• Conclusions: Our primers will be useful in improving resolution at shallow depths across the Thelypodieae, and likely in other 
Brassicaceae. 
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Despite the great importance of members of the Brassicaceae 
in agriculture and the extensive genomic resources available for 
Arabidopsis thaliana (L.) Heynh., our knowledge of phyloge- 
netic relationships within the family is still murky in many clades 
(Franzke et al, 201 1). Among the hurdles to elucidating phyloge- 
netic relationships within this family are extensive gene dupli- 
cation and polyploidization, and past and recent hybridization 
(Franzke et al, 201 1). Main lineages have been identified using a 
variety of regions (e.g., ITS, nad4, ndhF, phyA, Adh, chs, matK, 
rbcL, and trnLF). However, relationships at shallower levels 
(e.g., within some tribes or at the species level) are often charac- 
terized by poor resolution. New genomic tools have much to con- 
tribute to our understanding of evolution in Brassicaceae, but to 
date, technological, analytical, and logistical limitations have 
slowed down the wide-scale applicability of genomic approaches 
for phylogenetics (Egan et al., 2012). Thus, phylogenetic studies 
at the species level or of rapidly diversified groups still rely 
widely on single marker primer development. 

We developed a strategy to identify and design primers for 
single-copy nuclear genes (SCNGs) focusing on Streptanthus 
Nutt. and other members of the tribe Thelypodieae whose phy- 
logenetic relationships and circumscription have remained a 
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challenge (Warwick et al., 2010). Our strategy combines Illu- 
mina reads from genomic scans (low-depth-coverage sequenc- 
ing of total genomic DNA) and public expressed sequence tags 
(ESTs) from Brassica L., a close relative to the Thelypodieae, 
with results from previous studies that identified putative 
SCNGs at wider taxonomic scales using algorithmic methods. 
We report several primer combinations that might be of utility 
for informing relationships within and across groups in the 
Thelypodieae. 

METHODS AND RESULTS 

Our approach to identify SCNGs is outlined in Fig. 1 . We cross-referenced the 
results of three previous studies that used algorithms to identify putative SCNGs 
with published ESTs as follows: for the APVO loci (file 1471-2148-10-61-Sl.xls 
from Duarte et al., 2010), we kept only those loci reported to have introns and be 
SCNGs in A. thaliana; for the COSII set (file available at: http://solgenomics.net/ 
documents/markers/cosii.xls), we included all that were single-copy in A. thaliana; 
for the PPR genes (file NPH_2739_sm_TableSl.xls from Yuan et al., 2009), 
we kept those unique in rice and Arabidopsis; and we included all ESTs between 
B. napus and A. thaliana (Hut and Doyle, 2012) after removing duplicates. Our 
final matrix contained 10,817 loci (APVO, 5381; COSII, 2869; PPR, 90; ESTs, 
2477). The vast majority of loci (5596; 69.86%) were represented by a single 
source, 25% (2025) were represented by two sources, 5% (385) by three, and only 
0.05% (4) were present in all four sources. We selected loci for primer design 
at random and verified that the following four criteria were met for each locus 
(if not, we picked another locus): (1) it was identified as SCNG by multiple 
sources in the matrix above; (2) it was represented by a single gene model in the 
A. thaliana genome (TairlO); (3) it contained an estimated length range between 
600-1200 bp to allow for assembly from a single pass of Sanger sequencing; and 
(4) it possessed 40-60% intron content to maximize potential phylogenetic utility 
at species-level relationships (Rodriguez et al., 2009). In addition, we chose loci 
to span all five A. thaliana linkage groups. 

We selected 15 loci for primer design. We designed primers based on align- 
ments of genomic scans generated from low-coverage Illumina sequencing of 
total genomic DNA (Illumina GAIIx [Illumina Inc., San Diego, California, USA], 
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Algorithmic studies identifying 
putative SCNGs 




- Public ESTs 

[A. thatiana vs. S. napus) 



B Select loci for primer design* 



i 



c Primer design 

(using Genomic scans) 




i 



(B. rapa, R oleracea mapped onto A thaliana) 



0 Test primers 

(in silicoand in lab*) 



Select primers^ 



*see text for criteria applied 



Fig. 1. Approach used to identify SCNGs in this study. (A) Genes that were identified as putative SCNGs across different algorithmic studies were cross- 
referenced with public ESTs. (B) Loci for primer design were selected according to criteria outlined in the text. (C) Primer design was based on alignments 
of reads from shallow-depth Illumina sequencing of genomic DNA of Brassica rapa and B. oleracea (2x and 9x coverage, respectively) mapped onto the 
Arabidopsis thaliana genome. (D) Primers were tested in silico and in the laboratory before final selection for sequencing of products (E). 



80 bp reads) of B. rapa L. (2x coverage) and B. oleracea L. (9x coverage; 
L. Comai, unpublished data) mapped onto A. thaliana using the Burrows- Wheeler 
Alignment tool (BWA) (Li and Durbin, 2009) and visualized in IGV version 1.5 
(Thorvaldsdottir et al., 2013). We located the selected regions based on their 
locus ID and followed standard primer design guidelines, aiming for primers with 
a length of 22-25 bp, 40-60% GC content, melting temperature (TJ = 55-62°C, 
the presence of a 3' GC clamp, and without repeats, runs, or secondary structures 
such as hairpins, dimers, and cross-dimers. Prior to testing in the laboratory, we 
tested primer performance in silico using Amplify 3x version 3.1.4 (http://engels. 
genetics.wisc.edu/amplify/). We designed 250 primer combinations for the 15 
selected regions, and chose 52 to test in the laboratory. 



Between one and five primer pairs for each of 15 selected regions were 
tested for single band ampHfication in a set of taxa spanning several genera in 
the Thelypodieae. Here, we report statistics on primer combinations that consis- 
tently yielded a single band and whose product generated a clean sequence in at 
least 70% of taxa tested (five loci), as well as sequences for a few primer sets 
that could be of utility with additional optimization or in a different subset of 
taxa (Tables 1 and 2). 

Laboratory — Genomic DNA was extracted from tissue dried in silica gel 
using either the cetyltrimethylammonium bromide (CTAB) method (Doyle and 
Doyle, 1987) or the DNeasy Plant Mini Kit (QIAGEN, Valencia, CaUfornia, 
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Table 1. Primer regions that amplify a single band and yield clean sequences (first five) and others that might be useful on a clade-by-clade basis. 



Locus ID^ 


Primer set 




Primer sequences (5 '-30 


Length (bp) 


Cycle*' 


Max no. of 


AT1G56590 


702F/1535R 


F 
R 


AARGAY AAT T T CAT CAT T GT C T AT GAG 
GCATCCATYTCTTCAACAAGGTCG 


900 


PCR55 


1 


AT1G61620 


F2/R2 


F 
R 


GC AAAG AC AC T C G AAG AAC A 
AGGT T T GT C AC AC AC AAGAC T T 


1500 


PCR60 


1 


AT2G40600 


F2/R2 


F 
R 


T C AAAT GAG AC G AC AAT GG T T 
CACCTCCTTTGCTTTGTTTAC 


1300 


TDPCR56 


1 


AT4G34700 


Fl/Rl 


F 
R 


GAAGTTCAATGTCAACCAAGATG 
ACCATGAGCAATCAGTTTGT 


710 


ANNEXT63 


1 


AT5G25630 


F2/R2 


F 
R 


RAAGAAGGTGGAGGAGGCGT 
ACATCTGCCTTCACSTTACACTC 


450 


TDPCR56 


1 


AT5G27620'^ 


Fl/Rl 


F 
R 


GAAGTTCAATGTCAACCAAGATG 
ACCATGAGCAATCAGTTTGT 


1500 


TDPCR54 


2 


AT3G03100'^ 


F3/R3 


F 
R 


CCTACCAGATGGGAACCTCT 
GTACCGAGTCCAGTTCTTTTG 


1500 


TDPCR54 


2 


AT3G03100'^ 


F4/R3 


F 
R 


CATAACATAGGAGCGACACTT 
GTACCGAGTCCAGTTCTTTTG 


950 


TDPCR54 


1 


AT1G50020 


F3/R1 


F 
R 


GTGTGGCTCTCCTGTATCGTTT 
C GC C GAGAGT T AAGAC TAT C AAT 


950 


TDPCR54 


1 


AT5G26680 


5F/9R 


F 
R 


AAGAGACAGGAACTGGCTAAACG 
T GC AAAGT GC AGC AC AT T GC 


600 


PCR60 


1 


AT5G26680 


12F/13R 


F 
R 


ACTGCTCTAAAGCTTATTCGCCAG 
AAAGTTTCGAGCTTCATTATATGG 


450 


PCR60 


1 



^ Locus ID from The Arabidopsis Information Resource database (http://www.arabidopsis.org/). 

bpCR conditions are as follows: PCR55, PCR60: 94°C, 2:00; (94°C, 0:30; 55°C or 60°C, 1:10; 72°C, 2:00) 35x; final extension 72°C, 7:00. ANNEXT63: 
94°C, 2:00; (94°C, 0:30; 63 °C, 4:00) 30-34x; final extension 72°C, 7:00. TDPCR54: 94°C, 1:00; (94°C, 0:30; 58°C, 1:10; 72°C, 1:30) Ix; (94°C, 0:30; 
56°C, 1:10; 72°C, 1:30) Ix; (94°C, 0:30; 54°C, 1:10; 72°C, 1:30) 32x; final extension 72°C, 7:00. TDPCR56: 94°C, 1:00; (94°C, 0:30; 58°C, 1:10; 72°C, 
1:30) Ix; (94°C, 0:30; 56°C, 1:10; 72°C, 1:30) 33x; final extension 72°C, 7:00. 

^ Some clade variation observed; additional optimization might be required in some cases. 



USA). PCR reactions consisted of 5 jiL of 5x Green GoTaq Reaction Buffer 
(M791A; Promega Corporation, Madison Wisconsin, USA), 0.5 |iL dNTP 
mix (10 mM each), 0.5 jiL of each primer (10 jiM), and 0.2 \iL (5 units/jiL) 
of GoTaq (M3001; Promega Corporation) in a total volume of 25 jiL. 
Cycling conditions are presented in Table 2. Bidirectional sequencing was 
performed at Beckman Coulter Genomics (Danvers, Massachusetts, USA). 
When more than one band amplified, we isolated bands, reamplified, and 
sequenced directly. If cloning was necessary, PCR products were gel-purified 
(QIAquick Gel Extraction Kit, QIAGEN), hgated into pGEM T- Vector 
(Promega Corporation), cloned into E. coli DHB-5a-competent cells (Invit- 
rogen, Carlsbad, California, USA), reamplified (eight colonies per PCR product), 
and sequenced. 



Table 2. Summary of parsimony-informative characters for those regions 
for which we obtained sequence data (due to financial limitations we 
could only sequence a reduced number of amplicons). For those taxa 
where cloning (see Appendix 1) was necessary, the allele that yielded 
the shortest tree was selected. 



Region 


#Tips 


# Chars 


CTE 


#Var 


% Var 


nPIC 


PIC 


%PIC 


AT2G40600 


10 


1569 


1178 


391 


24.9 


229 


162 


10.3 


AT5G25630 


10 


296 


259 


37 


12.5 


23 


14 


4.7 


AT4G34700 


10 


722 


509 


213 


29.5 


180 


33 


4.6 


AT1G56590 


11 


891 


646 


245 


27.5 


199 


46 


5.2 


AT1G61620 


9 


1530 


1331 


199 


13.0 


146 


53 


3.5 


ITS^ 


11 


681 


587 


94 


13.8 


69 


25 


3.7 



Notes: # Tips = number of tips or accessions; # Chars = total number of 
characters; CTE = number of constant characters; # Var = number of 
variable characters; % Var = percentage of total characters that are variable; 
nPIC = number of nonparsimony informative characters; PIC = number of 
characters that are parsimony informative; % PIC = percentage of total 
characters that are parsimony informative. 

^ITS is included as a reference. 



Sequences were assembled and edited in Sequencher version 4.7 (Gene 
Codes Corporation, Ann Arbor, Michigan, USA). Potential PCR recombinants, 
assessed by manual examination of the sequences, were excluded. Alignment 
was performed manually in MacClade version 4.08 (Maddison and Maddison, 
2002), and proportion of informative characters calculated in PAUP* version 
4.0bl0 (Swofford, 2002). 

We have corroborated the utility of the SCNGs reported here by using a 
subset to estimate phylogenies of the "Streptanthoid" complex and its alhes, a 
group that has been subject to several substantial taxonomic revisions and 
whose phylogenetic relationships have remained poorly understood. While 
these results are beyond the scope of this paper and will be reported separately 
(Cacho et al., in prep.), given the level of phylogenetically informative variation 
that we observe (Table 2; Appendix 1) we have confidence that the SCNGs we 
contribute here will be useful to infer species-level phylogenies in several 
clades of the Thelypodieae and potentially of the Brassicaceae as a whole. 
These improved phylogenies could be an important stepping stone to facilitate 
comparative evolutionary studies in these clades until technological advances 
allow straightforward implementation of new sequencing technologies for low- 
cost phylogenetic studies. 



CONCLUSIONS 

Using a strategy that combines results from previous algorith- 
mic studies identifying putative SCNGs with genomic resources 
from published ESTs and Illumina genomic scans, we have iden- 
tified and designed primers for several SCNGs that are of phy- 
logenetic utility. Our primers yield sequences that are informative 
for phylogenies at and above the species level in most species 
of Thelypodieae and Sisymbreae we tested, including when pos- 
sible two or more species of Streptanthus, Streptanthella Rydb., 
Caulanthus S. Watson, Guillenia Greene, Stanley a Nutt., Si- 
symbrium L., Thelypodium EndL, and Thysanocarpus Hook. Given 
that we designed primers based on Arabidopsis and Brassica 
sequences, they are also likely to be useful for understanding 



http://www.bioone.org/loi/apps 



3 of 5 



Applications in Plant Sciences 2013 1(7): 1200002 
doi:10.3732/apps. 1200002 



Cacho and Strauss — New primers in Brassicaceae 



relationships among members of Camelineae, and potentially 
across Brassicaceae as a whole. 
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Appendix 1. GenBank accession and population information for the specimens used in this study. All vouchers are deposited at the University of California, 
Davis herbarium (DAV). 



Taxon 



Collection and locality information 



GenBank accession no.^ 



Caulanthus infiatus S. Watson NIC-S-066, Ballinger Canyon, CA, USA 



Streptanthus breweri A. Gray 



Streptanthus polygaloides 
A. Gray 



Streptanthella longirostris 
(S. Watson) Rydb. 



Sisymbrium trio L. 



Caulanthus coulteri S. Watson 



Stanleya pinnata (Pursh) 
Britton 

Streptanthus diversifolius 
S. Watson 



Streptanthus drepanoides 
Kruckeb. & J. L. Morrison 



KBR-020A, Knoxville-Berryessa Rd. at Hwy. 
128, CA, USA 



WAR-045, Washington Rd., CA, USA; SKY-082, 
Skyway Ave., C A, USA 



NIC-S-020, Coyote Canyon, CA, USA 



NIC-S-022, Coyote Canyon, CA, USA; 
LLSP-004, Lleida, Spain 



NIC-S-OOl, Caliente-Bodfish Rd., CA, USA 

NIC-S-054, trail off Hwy. 10, CA, USA 
NIC-S-085, Table Mountain, CA, USA 

LSAD-027A, Lime Saddle, CA, USA 



AT4G34700: NIC-S-066, KC517428 (bl); AT2G40600: NIC-S-066, 
KC5 17426; AT1G56590: NIC-S-066, KC5 17439; ATI G6 1620: 
NIC-S-066, KC517461 (blal); AT5G25630: NIC-S-066, 
KC5 17408; ITS: NIC-S-066, KC5 17450 

AT4G34700: KBR-020A, KC5 17429; AT2G40600: KBR-020A, 
KC517419; AT1G56590: KBR-020A, KC5 17440; ATI G6 1620: 
KBR-020A, KC5 17462; AT5G25630: KBR-020A, KC517409; 
ITS: KBR-020A, KC5 17451 

AT4G34700: SKY-082, KC5 17437; AT2G40600: WAR-045, 
KC517422; AT1G56590: WAR-045, KC5 17441 ; ATI G6 1620: 
WAR-045, KC517463; AT5G25630: WAR-045, KC517410; 
ITS: WAR-045, KC517452 

AT4G34700: NIC-S-020, KC5 17430; AT2G40600: NIC-S-020, 
KC5 17427; AT1G56590: NIC-S-020, KC5 17442; ATI G6 1620: 
NIC-S-020, KC5 17468; AT5G25630: NIC-S-020, KC517411; 
ITS: NIC-S-020, KC5 17453 

AT4G34700: LLSP-004, KC517436; AT2G40600: NIC-S-022, 
KC5 17424; AT1G56590: LLSP-004, KC5 17449; ATI G6 1620: 
na; AT5G25630: NIC-S-022, KC517412; ITS: NIC-S-022, 
KC5 17454 

AT4G34700: NIC-S-OOl, KC517431 (b2.al); AT2G40600: 
NIC-S-OOl, KC5 17425; AT1G56590: NIC-S-OOl, 
KC517443; AT1G61620: NIC-S-OOl, KC517464; 
AT5G25630: NIC-S-OOl, KC517413; ITS: 
NIC-S-OOl, KC517455 

AT4G34700: NIC-S-054, KC5 17432; AT2G40600: na; AT1G56590: 
NIC-S-054, KC5 17445; AT1G61620: NIC-S-054, KC517469; 
AT5G25630: KC517415; ITS: NIC-S-054, KC517457 

AT4G34700: NIC-S-085, KC517433 (b2.a2); AT2G40600: 
NIC-S-085, KC517421; AT1G56590: NIC-S-085, KC517446; 
AT1G61620: NIC-S-085, KC517465; AT5G25630: 
NIC-S-085, KC517416; ITS: NIC-S-085, KC517458 

AT4G34700: LSAD-027A, KC5 17434; AT2G40600: LSAD-027A, 
KC5 17420; AT1G56590: LSAD-027A, KC5 17447; ATI G6 1620: 
LSAD-027A, KC5 17466; AT5G25630: LSAD-027A na; ITS: 
LSAD-027A, KC5 17459 
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Appendix 1. Continued. 



Taxon 



Collection and locality information 



GenBank accession no.^ 



Caulanthus hallii Payson 



Caulanthus cooperi 
(S. Watson) Payson 



NIC-S-015, trail off Hwy. 8, CA, USA; 
NIC-S-285, trail off Hwy. 78, CA, USA 



NIC-S-055, trail off Hwy. 10, CA, USA; 
NIC-S-007, trail off Hwy. 78, CA, USA 



AT4G34700: NIC-S-015, KC5 17435; AT2G40600: NIC-S-015, 
KC517418; AT1G56590: NIC-S-285, KC5 17448; ATI G6 1620: 
NIC-S-015, KC5 17467; AT5G25630: NIC-S-015, KC517417; 
ITS: NIC-S-015, KC5 17460 

AT4G34700: NIC-S-007, KC5 17438; AT2G40600: NIC-S-055, 
KC517423; AT1G56590: NIC-S-055, KC5 17444; ATI G6 1620: 
na; AT5G25630: NIC-S-055, KC517414; ITS: NIC-S-055, 
KC5 17456 



Notes: NIC = N. Ivalu Cacho; SYS = Sharon Y. Strauss. 

^The few cases in which two bands were amplified are noted as is the allele the sequence corresponds to. 
''Individuals grown from seed are given codes according to their populations. 
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